naive classifier
Enhancing naive classifier for positive unlabeled data based on logistic regression approach
Płatek, Mateusz, Mielniczuk, Jan
We argue that for analysis of Positive Unlabeled (PU) data under Selected Completely At Random (SCAR) assumption it is fruitful to view the problem as fitting of misspecified model to the data. Namely, we show that the results on misspecified fit imply that in the case when posterior probability of the response is modelled by logistic regression, fitting the logistic regression to the observable PU data which {\it does not} follow this model, still yields the vector of estimated parameters approximately colinear with the true vector of parameters. This observation together with choosing the intercept of the classifier based on optimisation of analogue of F1 measure yields a classifier which performs on par or better than its competitors on several real data sets considered.
What Is the Naive Classifier for Each Imbalanced Classification Metric?
A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A baseline can be established using a naive classifier, such as predicting one class label for all examples in the test dataset. Another common mistake made by beginners is using classification accuracy as a performance metric on problems that have an imbalanced class distribution.
Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare
Martinez, Natalia, Bertran, Martin, Sapiro, Guillermo
Common fairness definitions in machine learning focus on balancing notions of disparity and utility. In this work, we study fairness in the context of risk disparity among sub-populations. We are interested in learning models that minimize performance discrepancies across sensitive groups without causing unnecessary harm. This is relevant to high-stakes domains such as healthcare, where non-maleficence is a core principle. We formalize this objective using Pareto frontiers, and provide analysis, based on recent works in fairness, to exemplify scenarios were perfect fairness might not be feasible without doing unnecessary harm. We present a methodology for training neural networks that achieve our goal by dynamically re-balancing subgroups risks. We argue that even in domains where fairness at cost is required, finding a non-unnecessary-harm fairness model is the optimal initial step. We demonstrate this methodology on real case-studies of predicting ICU patient mortality, and classifying skin lesions from dermatoscopic images.
A Machine Learning Approach for Detecting Students at Risk of Low Academic Achievement
Cornell-Farrow, Sarah, Garrard, Robert
We aim to predict whether a primary school student will perform in the `below standard' band of a national standardized test. We exploit a data set containing test performance on the National Assessment Program - Literacy and Numeracy (NAPLAN); a test given annually to all Australian school students in grades 3, 5, 7, and 9. We separate the analysis into students in grade 5 and above, for which previous achievement may be used as a predictor; and students in grade 3, which must rely on family- and school-level predictors only. We train and compare a set of classifiers for reading and numeracy learning areas respectively. The classifiers achieve good predictive power in terms of area under the ROC curve, suggesting that it is feasible for schools to more accurately screen a large number of students for academic risk.